Skip to content

Fix: seed sequence permutation tests and make results independent of n_jobs #1234

Open
selmanozleyen wants to merge 7 commits into
scverse:mainfrom
selmanozleyen:fix/seed-sequence-permutation-tests
Open

Fix: seed sequence permutation tests and make results independent of n_jobs #1234
selmanozleyen wants to merge 7 commits into
scverse:mainfrom
selmanozleyen:fix/seed-sequence-permutation-tests

Conversation

@selmanozleyen

@selmanozleyen selmanozleyen commented Jun 26, 2026

Copy link
Copy Markdown
Member

Fixes: #1233 and fixes #1232 as a side effect.

For each permutation we have a separate seed. The seeds are generated by SeeqSequence(root_seed).spawn(x) routine. This will prevent having correlated results from sequential seeds mentioned in the numpy docs. I tried to use the best practice but since we need to use numba in the future our seeds can be only Sequence[int], instead of Sequence[np.random.SeedSequence]. Hence:

def _spawn_seeds(seed: int | None, n: int) -> list[int]:
    # uint32 integer seeds, derived from independent SeedSequence children (avoids the correlation
    # of sequential seeds). uint32 is required to stay reproducible with numba/legacy RandomState,
    # which only accept uint32 integer seeds.
    return [int(s.generate_state(1)[0]) for s in np.random.SeedSequence(seed).spawn(n)]

Once you confirm. I will write warnings in docstrings and in notebooks about this behavioral change to users.

@selmanozleyen selmanozleyen self-assigned this Jun 26, 2026
selmanozleyen and others added 2 commits June 29, 2026 13:59
The seeds for permutation/simulation tests are now spawned per
permutation from a numpy.random.SeedSequence. This makes results
independent of `n_jobs`/`backend`, but changes the results obtained
with a given `seed` relative to earlier squidpy versions.

Document this with a `.. versionchanged:: 1.8.4` note on the affected
public functions (`ligrec`, `nhood_enrichment`, `spatial_autocorr`,
`ripley`). The shared note for the permutation-based functions lives in
a single docrep template (`seed_versionchanged`) to avoid duplication;
`ripley` keeps a tailored note as it concerns simulations. Also add a
release-notes entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@selmanozleyen selmanozleyen force-pushed the fix/seed-sequence-permutation-tests branch from 9a6008e to dbc04f7 Compare June 29, 2026 13:01
@flying-sheep

Copy link
Copy Markdown
Member

I tried to use the best practice but since we need to use numba in the future our seeds can be only Sequence[int], instead of Sequence[np.random.SeedSequence]. Hence:

please explain: why does the seed have to go inside of numba? Numba supports numpy.random.Generator instances (to a degree).

@selmanozleyen

Copy link
Copy Markdown
Member Author

please explain: why does the seed have to go inside of numba? Numba supports numpy.random.Generator instances (to a degree).

Oof, I didn't know about this. Thank you! Then we can fully modernize as well

@flying-sheep

flying-sheep commented Jun 29, 2026

Copy link
Copy Markdown
Member

first check if the methods you need work, I tried before and it failed since whatever distribution I tried didn’t work

@selmanozleyen

selmanozleyen commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

now I remember why I discarded generators. It was because it was documented that Generator was not thread safe but I guess it should be fine if we have a generator for each permutation that's used once I guess...

But another point is ripley uses .uniform(). Numba doesn't guarantee bit by bit alignment with numpy on that. But I guess we can document it once we numbafy ripley. As long as it stays reproducable per version it's good enough for me. (Which is the modern numpy Generator guarantee now anyway from what I understand...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproducable results indepent of n_jobs ripley reuses same seed for every simulation once a seed is provided

2 participants